Faster Approximate(d) Text-to-Pattern L1 Distance

نویسنده

  • Przemyslaw Uznanski
چکیده

The problem of finding \emph{distance} between \emph{pattern} of length $m$ and \emph{text} of length $n$ is a typical way of generalizing pattern matching to incorporate dissimilarity score. For both Hamming and $L_1$ distances only a super linear upper bound $\widetilde{O}(n\sqrt{m})$ are known, which prompts the question of relaxing the problem: either by asking for $1 \pm \varepsilon$ approximate distance (every distance is reported up to a multiplicative factor), or $k$-approximated distance (distances exceeding $k$ are reported as $\infty$). We focus on $L_1$ distance, for which we show new algorithms achieving complexities respectively $\widetilde{O}(\varepsilon^{-1} n)$ and $\widetilde{O}((m+k\sqrt{m}) \cdot n/m)$. This is a significant improvement upon previous algorithms with runtime $\widetilde{O}(\varepsilon^{-2} n)$ of Lipsky and Porat (Algorithmica 2011) and $\widetilde{O}(n\sqrt{k})$ of Amir, Lipsky, Porat and Umanski (CPM 2005). We also provide a series of reductions, showing that if our upper bound for approximate $L_1$ distance is tight, then so is our upper bound for $k$-approximated $L_1$ distance, and if the latter is tight then so is $k$-approximated Hamming distance upper bound due to the result of Gawrychowski and Uzna\'nski (arXiv 2017).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approximate Matching in the L1 Metric

Let a text T = t0, . . . , tn−1 and a pattern P = p0, . . . , pm−1, strings of natural numbers, be given. In the Approximate Matching in the L∞ metric problem the output is, for every text location i, the L∞ distance between the pattern and the length m substring of the text starting at i, i.e. Maxm−1 j=0 |ti+j − pj |. We consider the Approximate k − L∞ distance problem. Given text T and patter...

متن کامل

Online Pattern Matching for String Edit Distance with Moves

Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string to the other. Although optimizing EDM is intractable, it has many applications especially in error detections. Edit sensitive parsing (ESP) is an efficient parsing algorithm that guarantees an upper bound of parsing discrepancies betwee...

متن کامل

New Models and Algorithms for Multidimensional Approximate Pattern Matching

We focus on how to compute the edit distance (or similarity) between two images and the problem of approximate string matching in two dimensions, that is, to find a pattern of size m m in a text of size n n with at most k errors (character substitutions, insertions and deletions). Pattern and text are matrices over an alphabet of size . We present new models and give the first sublinear time se...

متن کامل

Pattern matching in pseudo real-time

It has recently been shown how to construct online, non-amortised approximate pattern matching algorithms for a class of problems whose distance functions can be classified as being local. Informally, a distance function is said to be local if for a pattern P of lengthm and any substring T [i, i+m−1] of a text T , the distance between P and T [i, i+m− 1] can be expressed as Σj∆(P [j], T [i+ j])...

متن کامل

A Black Box for Online Approximate Pattern Matching

We present a deterministic black box solution for online approximate matching. Given a pattern of length m and a streaming text of length n that arrives one character at a time, the task is to report the distance between the pattern and a sliding window of the text as soon as the new character arrives. Our solution requires O(Σ log2m j=1 T (n, 2 j−1)/n) time for each input character, where T (n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1801.09159  شماره 

صفحات  -

تاریخ انتشار 2018